Marketing Campaign Analysis of Portuguese banking institution

Project information

  • Category: Machine Learning & AI
  • Client: Academic Project
  • Project date: April 14, 2023
  • Collaboration 1: Sourabh Joshi
  • Collaboration 2: Prabhnoor Kaur

Marketing Campaign Analysis of Portuguese banking institution

In the realm of banking and finance, direct marketing campaigns stand as pillars of strategy, reaching out to potential clients via phone calls to introduce a spectrum of financial offerings, including term deposits. Within this dynamic landscape, the efficacy of such campaigns becomes paramount, driving institutions to seek insights into client behavior and subscription likelihood. This project navigates through the corridors of data stemming from the direct marketing endeavors of a prominent Portuguese banking institution. These campaigns, entailing numerous client engagements, aimed to decipher the binary outcome of term deposit subscription—either a resounding 'yes' or a definitive 'no'. From the depths of the UCI Machine Learning Repository emerges a treasure trove of datasets, each bearing witness to the intricate dance between marketers and clients, encapsulating an array of features illuminating the essence of these marketing interactions.

Armed with datasets laden with insights from direct marketing campaigns, this project embarks on a journey to unveil the mysteries of client subscription behavior within the realm of Portuguese banking. With every phone call and interaction, a story unfolds, and through meticulous analysis, patterns emerge to shed light on the likelihood of term deposit subscriptions. As the banking institution navigates the labyrinth of client engagements, the quest for optimization looms large, beckoning the deployment of predictive models to discern the effectiveness of marketing endeavors. Amidst the vast expanse of data lie opportunities to refine strategies, allocate resources judiciously, and chart a course towards enhanced marketing efficiency. From the cacophony of phone conversations to the nuanced intricacies of client demographics, this project delves deep into the heart of direct marketing campaigns, seeking to unravel the narrative hidden within the datasets of a Portuguese banking institution.

Problem Statement:

Our task is straightforward: predict whether a client will subscribe to a term deposit ('yes' or 'no') based on the provided features. This classification problem holds immense importance for the banking institution as it directly impacts the optimization of marketing strategies and resource allocation. By leveraging machine learning algorithms, our goal is to develop predictive models that can effectively discern clients' subscription preferences with accuracy and reliability.

In essence, our mission revolves around harnessing the power of data-driven insights to enhance the bank's decision-making process. By accurately predicting client behavior regarding term deposit subscriptions, we aim to equip the institution with valuable tools to tailor their marketing efforts, allocate resources efficiently, and ultimately drive greater success in their direct marketing campaigns. Through the application of machine learning algorithms, we seek to unlock the potential within the provided dataset, empowering the banking institution to make informed decisions that resonate with the needs and preferences of their clientele.

Approach:

The project aims to analyze direct marketing campaign data from a Portuguese banking institution to predict client subscription to term deposits. This involves four major steps:

  1. Data preparation
  2. Random Forest and LPM Analysis
  3. GBM Implementation
  4. AdaBoost Implementation

Data Preparation:

  • Obtain and preprocess the dataset, addressing missing values and ensuring class balance.
  • Correct feature types as needed to ensure data integrity and compatibility with the chosen machine learning algorithms.

Random Forest and LPM Analysis:

  • Test AUC and CI:
    • Utilize Random Forest (RF) and Linear Probability Model (LPM) to predict the target variable, employing 100 bootstrap loops.
    • Report the test AUC along with its 95% confidence interval for both models.
  • Variable Importance (MDI):
    • Evaluate variable importance using Mean Decrease Impurity (MDI) and identify the top 6 predictors from 100 runs of the RF model.
  • OOB Confusion Matrix:
    • Analyze the out-of-bag (OOB) confusion matrix generated after 100 runs of the RF model to assess classification performance.
  • Comparison of AUCs:
    • Compare test AUCs obtained from cross-validation with the OOB AUCs to evaluate and contrast model performance.

GBM Implementation:

  • Hyperparameter Tuning:
    • Perform hyperparameter tuning, focusing on interaction.depth, shrinkage, and n.trees, through a grid search approach.
    • Report the validation AUC for the best-performing model identified.
  • Test with Bootstrapping:
    • Conduct 50 bootstrapping loops to test the best GBM model, reporting test AUCs along with their standard errors.
  • Identification of Top Predictors:
    • Identify the top predictors contributing significantly to the classification task using the winning GBM model.

AdaBoost Implementation:

  • GBM with AdaBoost:
    • Apply AdaBoost.M1 with GBM utilizing the adaboost function, following a similar evaluation approach as with GBM.
  • JOUSBoost Implementation:
    • Implement AdaBoost using the JOUSBoost package, ensuring adherence to the requirements of the adaboost function and analyzing results accordingly.

Conclusion:

Based on our analysis of the direct marketing campaign data from the Portuguese banking institution, several key insights have emerged:

  1. Random Forest (RF) Model:

    • Mean Test AUC: ~0.915
    • Standard Deviation of Test AUC: ~0.009
    • 95% Confidence Interval for Test AUC: (0.904, 0.928)
  2. Linear Probability Model (LPM):

    • Mean Test AUC: ~0.890
    • Standard Deviation of Test AUC: ~0.010
    • 95% Confidence Interval for Test AUC: (0.873, 0.904)
  3. Variable Importance:

    Through the Mean Decrease Impurity (MDI) analysis, we identified the top predictors contributing to the classification task. Notably, attributes such as 'duration', 'month', 'balance', 'age', 'day', and 'job' emerged as the most influential factors in predicting client subscription behavior.

  4. Classification Performance:

    Analysis of the Out-of-Bag (OOB) confusion matrix after 100 runs of the RF model revealed its ability to correctly classify instances of client subscription. The confusion matrix showed a significant number of true positives (TP = 74) and true negatives (TN = 1440), indicating the model's reliability in predicting subscription outcomes. Additionally, the matrix displayed false positives (FP = 116) and false negatives (FN = 35), providing insights into areas for potential model improvement.

Overall, our findings highlight the effectiveness of machine learning algorithms, particularly Random Forest and Linear Probability Models, in predicting client subscription behavior in direct marketing campaigns for term deposits. These insights provide valuable guidance for the banking institution's marketing strategies, facilitating more targeted and efficient resource allocation to maximize client subscription rates.